Links

Can I get phenotype, gender and family relationship information for the 1000 Genomes samples?

Answer:

For the 1000 Genomes Project, due to the freely available nature of the data, no phenotype information was collected for any of the samples. All donors were over 18 and declared themselves to be healthy at the time of collection. We do provide a sample spreadsheet and a pedigree file which contain ethnicity and gender for 1000 Genomes samples.

Related questions:

Can I volunteer to be part of the 1000 genomes project?

Answer:

The 1000 Genomes Project is not accepting volunteers to be sequenced. More information about how samples were recruited please see the About page.

Another large scale resequencing project that does still have rounds of recruitment is the Personal Genomes Project

Related questions:

Is there any gene expression data available for the 1000 Genomes Project samples?

Answer:

The most important available existing expression datasets involving 1000 Genomes individuals are probably the following:

RNAseq (mRNA & miRNA) on 465 individuals (CEU, TSI, GBR, FIN, YRI)

Pre-publication RNA-sequencing data from the Geuvadis project is available through http://www.geuvadis.org

http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/samples.html
http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-2/samples.html

RNAseq on 60 CEU individual [1]

http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197

Expression arrays on about 800 HapMap 3 individuals with a lot of overlap with 1000g data [1,2]

http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-198
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-264

RNAseq for 69 YRI individuals [3]

http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-19480

References

  1. Reference:Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010 Apr 1;464(7289):773-7. Epub 2010 Mar 10.
  2. Reference: Stranger,B.E S.B. Montgomery, A.S. Dimas, L. Parts, O. Stegle, C.E. Ingle, M. Sekowska, G. Davey Smith, D. Evans, M. Gutierrez-Arcelus, A. Price, T. Raj J. Nisbett, A.C. Nica, C. Beazley, R. Durbin, P. Deloukas, E.T. Dermitzakis. Patterns of cis regulatory variation in diverse human populations. PLoS Genetics in press
  3. Reference: Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010 Apr 1;464(7289):768-72. Epub 2010 Mar 10.

Related questions:

What do your population codes like CEU or TSI mean?

Answer:

These codes represent our populations, each three letter code represents a different population, CEU means Northern Europeans from Utah and TSI means Tuscans from Italy. There is a summary of all these codes both in a readme on the ftp site and in the alternative question Which populations are part of your study?

Related questions:

Which populations are part of your study?

Answer:

There are 26 different populations which are part of our study from many different locations around the globe. The following table lists these populations and indicates what data we currently have available for them.

Population Code Population Description Super Population Code Sequence Data Available Alignment Data Available Variant Data Available
CHB Han Chinese in Beijing, China EAS 1 1 1
JPT Japanese in Tokyo, Japan EAS 1 1 1
CHS Southern Han Chinese EAS 1 1 1
CDX Chinese Dai in Xishuangbanna, China EAS 1 1 1
KHV Kinh in Ho Chi Minh City, Vietnam EAS 1 1 1
CEU Utah Residents (CEPH) with Northern and Western European Ancestry EUR 1 1 1
TSI Toscani in Italia EUR 1 1 1
FIN Finnish in Finland EUR 1 1 1
GBR British in England and Scotland EUR 1 1 1
IBS Iberian Population in Spain EUR 1 1 1
YRI Yoruba in Ibadan, Nigeria AFR 1 1 1
LWK Luhya in Webuye, Kenya AFR 1 1 1
GWD Gambian in Western Divisions in the Gambia AFR 1 1 1
MSL Mende in Sierra Leone AFR 1 1 1
ESN Esan in Nigeria AFR 1 1 1
ASW Americans of African Ancestry in SW USA AFR 1 1 1
ACB African Caribbeans in Barbados AFR 1 1 1
MXL Mexican Ancestry from Los Angeles USA AMR 1 1 1
PUR Puerto Ricans from Puerto Rico AMR 1 1 1
CLM Colombians from Medellin, Colombia AMR 1 1 1
PEL Peruvians from Lima, Peru AMR 1 1 1
GIH Gujarati Indian from Houston, Texas SAS 1 1 1
PJL Punjabi from Lahore, Pakistan SAS 1 1 1
BEB Bengali from Bangladesh SAS 1 1 1
STU Sri Lankan Tamil from the UK SAS 1 1 1
ITU Indian Telugu from the UK SAS 1 1 1

These populations have been divided into 5 super populations

  • AFR, African
  • AMR, Ad Mixed American
  • EAS, East Asian
  • EUR, European
  • SAS, South Asian

When the code ALL is used this means that all individuals from that release are being considered.

Related questions:

Which samples are you sequencing?

Answer:

There is a list of samples who are part of the project available from this spreadsheet. There is also a pedigree file available from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_g1k.ped

Please note this spreadsheet does list samples who are related to the ones we are sequencing but aren’t themselves being sequenced. If a sample has no data in the Total LC or Total E Sequence columns it means it was not sequenced for the main project

Related questions: